XLNet: Generalized Autoregressive Pretraining for Language Understanding

Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V.

Computer Science > Computation and Language

arXiv:1906.08237 (cs)

[Submitted on 19 Jun 2019 (v1), last revised 2 Jan 2020 (this version, v2)]

Title:XLNet: Generalized Autoregressive Pretraining for Language Understanding

Authors:Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

View PDF

Abstract:With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

Comments:	Pretrained models and code are available at this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1906.08237 [cs.CL]
	(or arXiv:1906.08237v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.08237

Submission history

From: Zhilin Yang [view email]
[v1] Wed, 19 Jun 2019 17:35:48 UTC (264 KB)
[v2] Thu, 2 Jan 2020 12:48:08 UTC (2,662 KB)

Computer Science > Computation and Language

Title:XLNet: Generalized Autoregressive Pretraining for Language Understanding

Submission history

Access Paper:

References & Citations

7 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:XLNet: Generalized Autoregressive Pretraining for Language Understanding

Submission history

Access Paper:

References & Citations

7 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators